{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# \ud83d\udcdd Exercise M3.01\n",
    "\n",
    "The goal is to write an exhaustive search to find the best parameters\n",
    "combination maximizing the model generalization performance.\n",
    "\n",
    "Here we use a small subset of the Adult Census dataset to make the code faster\n",
    "to execute. Once your code works on the small subset, try to change\n",
    "`train_size` to a larger value (e.g. 0.8 for 80% instead of 20%)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "adult_census = pd.read_csv(\"../datasets/adult-census.csv\")\n",
    "\n",
    "target_name = \"class\"\n",
    "target = adult_census[target_name]\n",
    "data = adult_census.drop(columns=[target_name, \"education-num\"])\n",
    "\n",
    "data_train, data_test, target_train, target_test = train_test_split(\n",
    "    data, target, train_size=0.2, random_state=42\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.compose import make_column_transformer\n",
    "from sklearn.compose import make_column_selector as selector\n",
    "from sklearn.preprocessing import OrdinalEncoder\n",
    "\n",
    "categorical_preprocessor = OrdinalEncoder(\n",
    "    handle_unknown=\"use_encoded_value\", unknown_value=-1\n",
    ")\n",
    "preprocessor = make_column_transformer(\n",
    "    (categorical_preprocessor, selector(dtype_include=object)),\n",
    "    remainder=\"passthrough\",\n",
    ")\n",
    "\n",
    "from sklearn.ensemble import HistGradientBoostingClassifier\n",
    "from sklearn.pipeline import Pipeline\n",
    "\n",
    "model = Pipeline(\n",
    "    [\n",
    "        (\"preprocessor\", preprocessor),\n",
    "        (\"classifier\", HistGradientBoostingClassifier(random_state=42)),\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the previously defined model (called `model`) and using two nested `for`\n",
    "loops, make a search of the best combinations of the `learning_rate` and\n",
    "`max_leaf_nodes` parameters. In this regard, you need to train and test the\n",
    "model by setting the parameters. The evaluation of the model should be\n",
    "performed using `cross_val_score` on the training set. Use the following\n",
    "parameters search:\n",
    "- `learning_rate` for the values 0.01, 0.1, 1 and 10. This parameter controls\n",
    "  the ability of a new tree to correct the error of the previous sequence of\n",
    "  trees\n",
    "- `max_leaf_nodes` for the values 3, 10, 30. This parameter controls the depth\n",
    "  of each tree."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now use the test set to score the model using the best parameters that we\n",
    "found using cross-validation. You will have to refit the model over the full\n",
    "training set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "jupytext": {
   "main_language": "python"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}